Learning Efficient Anomaly Detectors from K-NN Graphs

نویسندگان

  • Jonathan Root
  • Jing Qian
  • Venkatesh Saligrama
چکیده

We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average KNN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and realdata experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient anomaly detection using bipartite k-NN graphs

Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. The K-kNNG detector, while possessing several desirable chara...

متن کامل

Learning Minimum Volume Sets and Anomaly Detectors from KNN Graphs

We propose a non-parametric anomaly detection algorithm for high dimensional data. We first rank scores derived from nearest neighbor graphs on n-point nominal training data. We then train limited complexity models to imitate these scores based on the max-margin learningto-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile...

متن کامل

X Less is More: Building Selective Anomaly Ensembles

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in t...

متن کامل

Scalable $k$-NN graph construction

The k-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct k-NN graphs remains a challenge, especially for large-scale high-dimensional data. In this paper, we propose a new approach to construct approximate k-NN graphs with emphasis in: efficiency and accuracy. We hierar...

متن کامل

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and eac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1502.01783  شماره 

صفحات  -

تاریخ انتشار 2015